Introduction

The following is intended as a set of tips for people learning how to use Git and GitHub.

Session plan

  • Tips
  • Short practical on making a pull request
  • Elsie - the OpenSAFELY codelist system

Session aims

By the end of the session you should

  • have a basic understanding of how Git works
  • be able to perform common Git operations using GitHub Desktop, including
    • clone a repo from GitHub
    • make a new branch
    • make commits
    • push your branch to GitHub
    • make a pull request

Guides to Git and GitHub

There are many excellent guides to Git and GitHub online, e.g.,

  • Intro to GitHub here
  • GitHub Training & Guides YouTube channel here
  • Git documentation and training here
  • Hadley Wickham on Git here
  • Jenny Bryan on Git and GitHub with R here

And most relevantly the OpenSAFELY documentation here.

These tips are meant to supplement them.

Tips

Intro to Git

  • Git was written to allow developers work on the source code of the Linux kernel
    • One kernel release they got in a terrible mess
    • This provoked Linus Torvalds into action
    • For an excellent insight into his thinking watch this talk he gave at Google here
    • Git was designed to work with text files
    • (Especially if used at the command line) Git can be intimidating to use and we can get Git errors (which like LaTeX and R errors can be quite cryptic)
  • A Git repository is a folder/directory on your computer which has been Git initialised
    • Using either the command line

      git init mynewfolder
    • Or GitHub Desktop

    • Repos on GitHub are already Git initialised

      • When you clone them down to your computer they work in GitHub Desktop
  • Git is commonly referred to as version control software
  • Git is better described as a content addressable filesystem which translates to Git tracks the contents of the files in your repo
    • Git looks for changes in your files when you save them, so when you have unsaved changes in a file/s Git shows no changes until you save

    • Git takes snapshots of your files - when you tell it to - commits - saved my file from above, enter a commit message and click “Commit to master”

    • Commits are identified by the 40-character checksum SHA-1 hash of the contents of your files at that time

    • Git knows the state of your files at every commit

      • Git can easily restore your files to a previous state
    • For Git the state of your files only changes when their contents change

      • If you reopen a file, make no changes, then resave it, Git will show no changes
      • If you add an empty folder/directory to your repo Git will detect no changes in your repo
      • This differs to OneDrive/SharePoint/Google Drive which are file synchronisation systems
    • I recommend to not place your Git repos in a location that is sync’d by either OneDrive or Google Drive (they are very different syncing technologies to Git)

The .git folder

  • When you initialise a directory the .git folder is created
  • This contains all of the files Git uses to track the contents of your files
  • Here is the .git folder of a repo on my computer (I have selected to View hidden files in Windows Explorer)
  • Confusingly GitHub hides the .git folder from view
  • Here are its contents - never edit these manually
  • Explanation of these is (from here)

Common Git commands

  • I recommend you use GitHub Desktop instead of these commands
  • These commands are what GitHub Desktop is using behind the scenes
  • Git is the name of the program, git is the name of the executable available at your command line
git init 
git add <filename>
git status
git commit -m "Your commit message"
git commit --amend -m "Your amended commit message"
git push 
git pull 
git clone
git branch
git checkout
git merge
git fetch 

Installing Git and GitHub Desktop

Installing Git

  • Windows
    • Download and install from here
  • macOS comes with an out-dated version of Git
    • I recommend installing the Homebrew version

    • First install Homebrew, see instructions here

    • Then run in your Terminal app

      brew upgrade
      brew install git
    • Additionally on a Mac it is helpful to install Xcode command line tools (i.e., avoid installing the whole of Xcode.)

      xcode-select --install
      • Must reinstall these everytime upgrade operating system versions, e.g., from Big Sur to Monterey
  • Once Git is installed its executable (called git) should be available at your command line
    • Check which version you have with (you want something recent-ish)

      git --version
    • On my Windows machine I have

      git version 2.33.1.windows.1

Installing GitHub Desktop

  • You could use Git through its command syntax however I recommend you use a graphical git editor
  • For Windows and macOS download and install GitHub Desktop from here
  • A Linux version of GitHub Desktop is available from here
  • I recommend installing the free VSCode text editor, from here, and setting that as the “External editor” in GitHub Desktop options (Click: File | Options…)
  • On Windows I also recommend installing Windows Terminal from here

Intro to GitHub

  • GitHub is a Git web server, there are others e.g., GitLab

  • Your repositories will be stored on GitHub, and you will clone them to your machine to work on them (or work on them in Gitpod)

  • Under your user account you see the repos you are owner of

  • On GitHub OpenSAFELY is an organization

    • The repos are owned by the organization so they show up under the organisation here

GitHub PAT for R

  • To create a GitHub Personal Access Token (PAT) to be allowed more downloads from GitHub per hour run in R
install.packages("usethis")
library(usethis)
create_github_token()

GitHub CLI

  • GitHub CLI stands for command line interface for operating GitHub
  • Installation instructions are here
  • But I don’t recommend using this

Git and GitHub Workflow

Standard GitHub workflow

  • (I recommend to only fork a public repo if you intend to send a pull request to it)
  • Fork the other person’s repo (this will be known as the upstream repo from your fork, your copy of a repo on GitHub is known as origin)
  • This creates a copy of their repo under your account (your fork)
  • Clone your fork (the copy under your account) to your machine
  • Create a new branch (do not work on master/main)
  • Make your changes and commit them
  • Push your new branch upto your GitHub (i.e., to your fork)
  • Create a pull request (from your new branch) back to the default (master/main) branch of the original repo

Workflow with an OpenSAFELY GitHub repo

  • Skip the forking step from the standard GitHub workflow
  • The repo on GitHub is known as origin
  • Clone the repo to your local machine
    • Click: Code | Open with GitHub Desktop
    • Click Clone in the box which appears in GitHub Desktop
    • In GitHub Desktop (i.e. locally) make a new branch
  • Do some work
    • Make some changes (to your project.yaml/study_definition.py/R scripts)
    • In GitHub Desktop select relevant changed lines and make small-ish commits with sensible commit messages
    • Do not commit changes to many files with a single commit message such as “Edits”!
    • Note that in a commit we can see the added lines - green highlight with + prefix - and deleted lines - red highligh with - prefix
  • Push your new branch from your local machine up to GitHub
  • Make a pull request from your branch to the default branch

Making a pull request

  • Let’s start by creating a new branch
  • We do some work and make a new commit which adds the new file to the repo
  • Next publish the new branch to GitHub
  • Now initiate the creation of the PR by either clicking in GitHub Desktop “Create Pull Request”
  • or clicking on the button on the repo webpage “Compare & pull request”
  • Edit the title box, add some extra text in the comment box, select a reviewer, and then click “Create pull request”
  • You can amend/edit pull requests by modifying/adding commits to the branch from which you sent the PR
  • See more about pull request reviews here
  • Merge PR
  • Confirm the merge
  • (Optional) Delete the branch the PR came from
  • The PR is now finished and we can see the merge commit in the default (main/master) branch
  • In GitHub Desktop click “Fetch origin”/“Pull origin” to pull the updated main/master branch down to your local machine … and the process begins again …

Common errors

Forgetting to pull down the latest changes from GitHub

  • (Especially in the morning) It is very easy to forget to pull down latest changes when reopening a project
  • Let’s say I or a colleague made changes and those are pushed to GitHub
    • The next day I restart work on a different computer, GitHub Desktop will show for example
  • But you forget to click “Pull origin”
  • If you make commits onto a branch on which there are not yet pulled commits on GitHub you’ll get a merge error when you eventually click “Pull origin”
  • You could resolve conflict e.g., in VSCode
  • We can see this can happen when we see both up and down arrows in Pull origin box (but not always)
  • Fix
    • Move your changes to a new branch

    • Move back to master/main and undo the changes there, then edit the files so they show no changes

    • Pull down the changes from GitHub

    • Merge changes from your new branch into the main/master/relevant branch

Merge conflict

See

  • About merge conflicts here
  • Resolving a merge conflict here

OpenSAFELY repositories

  • OpenSAFELY is a system of Python packages (opensafely and cohortextractor) which run various Docker containers
    • The main GitHub organisation page is here
    • All the core code is published in their opensafely-core organisation on GitHub here
    • And there is also their opensafely-actions organisation here
  • A Docker container is a like a virtual machine
    • It defines the operating system and programs running within it
    • On my Windows 10 machine I can run an Ubuntu docker container
    • Just because an R package is installed in the R installation on your machine does not mean that it is installed in the OpenSAFELY R Docker container
      • See the list of packages in the R Docker container here

Demo repo

  • Have a look at the demo repo here

Getting started

  • See OS page here
  • If creating a new repo create from the OS template here
  • This is already Git initialized
  • Important files
    • project.yaml
      • Defines the jobs and the order in which they run
    • /analysis/study_defintion.py
      • Defines the study population extracted from the OpenSAFELY database
      • This should return .csv file/s of data to read into R
    • /analysis/##_R-scripts.R
      • Your analysis scripts

Running jobs (on the dummy data)

  • In your OS repo online
    • Use Gitpod
  • On your own machine - install the following free software
    • (If on Windows - Windows Subsystem for Linux version 2)
    • Docker Desktop
    • Python
    • Git
    • GitHub Desktop
    • VSCode text editor

Additional topics

Writing good commit messages

  • Follow the standard recommendations about making commit messages, see

Files for Git to ignore

  • You should not commit all files in the folder on your computer into your repo
  • The .gitignore file is a list of files and folders in your repo for Git to ignore
  • Common files to ignore are
    • .Rhistory
    • .DS_Store

GitHub repos contain more than just code

  • A repo for an R package will probably contain
    • The code for the R package
    • The code for its website (often made with pkgdown and hosted with GitHub Pages or Netlify)
    • Scripts for controlling continuous integration services such as GitHub Actions

Short practical

  1. On GitHub:
    1. Go to our test repo (in our test organization) here
    2. Clone the repo to your local machine
  2. In GitHub Desktop: make a new branch and switch to it
  3. In any text editor:
    1. Create a new markdown file called firstname-lastname.md
    2. Add a sentence or two to the file about yourself
    3. Save this file into the (top level of the) repo
  4. In GitHub Desktop: Commit this new file into your new branch
  5. In GitHub Desktop: Push your new branch upto GitHub
  6. On GitHub: Open a pull request from your branch to the main branch in which you select a reviewer (Tom/Venexia/Elsie)
  7. In your text editor and GitHub Desktop: Make any changes requested by the reviewer and add these to your PR - hopefully your pull request will then be merged by the reviewer!
  8. On GitHub: Delete the branch you made your pull request from
  9. In GitHub Desktop: Pull down the updated master branch to your machine … in a real workflow you would then make another new branch and do more work…